AWK's Numerical Functions

By Bruce Barnett

This series of articles relating to Sun administration issues is by Bruce Barnett. Barnett has used Sun computers for six years, including two years as a system administrator. He can be reached at barnett@crd.ge.com.

In previous articles, I have shown how useful AWK is in manipulating information, and generating reports. When you add a few functions, AWK becomes even more, mmm, functional.

I originally didn't plan on discussing the differences between AWK and NAWK, but there are many subtle differences that are worth knowing about. Since all Suns have AWK and NAWK, you can pick which one you want to use, based on the features you need. Remember, all AWK functions are also in NAWK.

There are three types of functions: numeric, string and whatever's left. In this article, I will only discuss the numeric functions. Table 1 lists all of the numeric functions:


¦|	Table 1	¦|
¦|	¦|
¦|	Numeric Functions	¦|
¦| Name Function	Variant¦ |

¦| cos	cosine	AWK	¦|
¦| exp	Exponent	AWK	¦|
¦| int	Integer	AWK	¦|
¦| log	Logarithm	AWK	¦|
¦| sin	Sine	AWK	¦|
¦| sqrt Square Root AWK	¦|
¦| atan2 Arctangent NAWK	¦|
¦| rand Random	NAWK	¦|
¦| srand Seed Random NAWK	¦|

Trigonometric functions

Oh joy! I bet millions, if not dozens, of my readers have been waiting for me to discuss trigonometry. Personally, I don't use trigonometry much at work, except when I go off on a tangent. Sorry about that. I don't know what came over me. I don't usually resort to puns. I'll write a note to myself, and after I sine the note, I'll have my boss cosine it.

Now stop that! I hate arguing with myself. I always lose. Thinking about math that I learned in the year 2 B.C. (Before Computers) seems to cause flashbacks of high school, pimples and (shudder) times best left forgotten. The stress of remembering those days must have made me forget the standards I normally set for myself. Besides, no-one appreciates obtuse humor anyway, even if I find acute way to say it.

I better change the subject fast. Combining humor and computers is a very serious matter.

Here is a NAWK script that calculates the trigonometric functions for all degrees between 0 and 360. It also shows why there is no tangent, secant or cosecant function. (They aren't necessary.) If you read the script, you will learn of some subtle differences between AWK and NAWK. All this in a thin veneer of demonstrating why we learned trigonometry in the first place. What more can you ask for?

#!/usr/bin/nawk -f
#
# A smattering of trigonometry...
#
# This AWK script plots the values from 0 to 360
# for the basic trigonometry functions
# but first - a review:
#
#  
# 
#
# Assume the following right triangle
#
#	Angle Y
#
#	|\
#	| \
#	| \
#	a | \ c
#	| \
#	|	\
#	+------- Angle X
#	b
#
# since the triangle is a right angle, then #	X+Y=90
#
# Basic Trigonometric Functions. If you know the length 
# of 2 sides, and the angles, you can find the length of the third side. 
# Also - if you know the length of the sides, you can calculate 
# the angles.
#
# The formulas are
#
#	sine(X) = a/c
#	cosine(X) = b/c
#	tangent(X) = a/b
#
# reciprocal functions
#	cotangent(X) = b/a
#	secant(X) = c/b
#	cosecant(X) = c/a
#
# Example 1

# if an angle is 30, and the hypotenuse (c) is 10, then 
#	a = sine(30) * 10 = 5
#	b = cosine(30) * 10 = 8.66
#
# The second example will be more realistic: 
#
#	Suppose you are looking for a Christmas tree, and
# while talking to your family, you smack into a tree 
# because your head was turned, and your kids were arguing over who 
# was going to put the first ornament on the tree. 
#
# As you come to, you realize your feet are touching the trunk of the tree,
# and your eyes are six feet from the bottom of your frostbitten toes. 
# While counting the stars that spin around your head, you also realize 
# the top of the tree is located at a 65 degree angle, relative to your eyes. 

# You suddenly realize the tree is 12.84 feet high! After all, 
#	tangent(65 degrees) * 6 feet = 12.84 feet

# All right, it isn't realistic. Not many people memorize the 
# tangent table, or can estimate angles that accurately. 
# I was telling the truth about the stars spinning around the head, however. 

#
BEGIN {
# assign a value for pi.
PI=3.14159;
# select an "Ed Sullivan" number - really really big 
BIG=999999;
# pick two formats
# Keep them close together, so when one column is made larger 
# the other column can be adjusted to be the same width 
	fmt1="%7s %8s %8s %8s %10s %10s %10s %10s\n"; 
# print out the title of each column
	fmt2="%7d %8.2f %8.2f %8.2f %10.2f %10.2f %10.2f %10.2f\n"; 
# old AWK wants a backslash at the end of the next line 
# to continue the print statement
# new AWK allows you to break the line into two, after a comma 
printf(fmt1,"Degrees","Radians","Cosine","Sine", \ 
"Tangent","Cotangent","Secant", "Cosecant");

for (i=0;i<<=360;i++) {
# convert degrees to radians
r = i * (PI / 180 );
# in new AWK, the backslashes are optional # in OLD AWK, they are required
printf(fmt2, i, r, \
# cosine of r
cos(r), \
# sine of r
sin(r), \
#
# I ran into a problem when dividing by zero. 
# So I had to test for this case.
#
# old AWK finds the next line too complicated 
# I don't mind adding a backslash, but rewriting the 
# next three lines seems pointless for a simple lesson. 
# This script will only work with new AWK, now - sigh... 
# On the plus side,
# I don't need to add those back slashes anymore 
#
# tangent of r
(cos(r) == 0) ? BIG : sin(r)/cos(r),
# cotangent of r
(sin(r) == 0) ? BIG : cos(r)/sin(r),
# secant of r
(cos(r) == 0) ? BIG : 1/cos(r),
# cosecant of r
(sin(r) == 0) ? BIG : 1/sin(r));
}
# put an exit here, so that standard input isn't needed. 
	exit;
}

NAWK also has the arctangent function. This is useful for some graphics work, as

arc tangent(a/b) = angle (in radians)

Therefore if you have the X and Y locations, the arctangent of the ratio will tell you the angle. The atan2() function returns a value from negative pi to positive pi.

Exponents, logs and square roots

The following script uses three other arithmetic functions: log, exp, and sqrt. I wanted to show how these can be used together, so I divided the log of a number by two, which is another way to find a square root. I then compared the value of the exponent of that new log to the built-in square root function. I then calculated the difference between the two, and converted the difference into a posi- tive number.

#!/bin/awk -f
# demonstrate use of exp(), log() and sqrt in AWK 
# e.g. what is the difference between using logarithms and regular arithmetic 
# note - exp and log are natural log functions - not base 10 
#
BEGIN {
# what is the about of error that will be reported? 
ERROR=0.000000000001;
# loop a long while
for (i=1;i<<=2147483647;i++) {
# find log of i
logi=log(i);
# what is square root of i?
# divide the log by 2
logsquareroot=logi/2;
# convert log of i back
squareroot=exp(logsquareroot);
# find the difference between the logarithmic calculation 
# and the built in calculation
diff=sqrt(i)-squareroot;
# make difference positive
if (diff < 0) {
diff*=-1;
}
if (diff > ERROR) {
printf("%10d, squareroot: %16.8f, error: %16.14f\n", \
i, squareroot, diff);
}
}
exit;
}

Yawn. This example isn't too exciting, except to those who enjoy nitpicking. Expect the program to reach 3 million before you see any errors. I'll give you a more exciting sample soon.

Truncating Integers

All version of AWK contain the int function. This truncates a number, making it an integer. It can be used to round numbers by adding 0.5:

printf("rounding %8.4f gives %8d\n", x, int(x+0.5));

Random Numbers

NAWK has functions that can generate random numbers. The function rand returns a random number between 0 and 1. Here is an example that calculates a million random numbers between 0 and 100, and counts how often each number was used:

#!/usr/bin/nawk -f
# old AWK doesn't have rand() and srand() # only new AWK has them
# how random is the random function?
BEGIN {
#	srand();
i=0;
while (i++<<1000000) {
x=int(rand()*100 + 0.5);
y[x]++;
}
for (i=0;i<<=100;i++) {
printf("%d\t%d\n",y[i],i);
}
exit;
}

If you execute this script several times, you will get the exact same results. Experienced programmers know random number generators aren't really random, unless they use special hardware. These numbers are pseudo-random, and calcu- lated using some algorithm. Since the algorithm is fixed, the numbers are repeatable unless the numbers are seeded with a unique value. This is done using the srand function above, which is commented out. Typically the random number generator is not given a special seed until the bugs have been worked out of the program. There's nothing more frustrating than a bug that occurs randomly. The srand function may be given an argument. If not, it uses the current time and day to generate a seed for the random number generator.

The Lotto script

I promised a more useful script. This may be what you are waiting for. It reads two numbers, and generates a list of random numbers. I call the script "lotto.awk". If you had to pick 6 numbers from 1 to 54, you could type:

echo 6 54 ¦| lotto.awk

and the program would provide some numbers for you.


#!/usr/bin/nawk -f
# this example calculates suitable numbers
# for a lottery ticke>
# it reads one line of information from standard input, and
# prints out a numerically sorted list of random numbers. 
# the first field specifies how many numbers to print 
# the second field specifies the maximum random number BEGIN {
# initialize seed
srand();
}
{
# first field is the number of numbers generated
# second field is the maximum number
if (NF != 2) {
printf("input should have two fields\n") >>"/dev/tty"; } else {
NUM=$1;
MAX=$2;

# initialize number found
Number=0;
while (Number << NUM) {
r=int((rand() * MAX) + 0.5);
# have I seen this number before?
if (array[r] == 0) {
#	no, I have not
Number++;
array[r]++;
}

}

# now output all numbers, in order
for (i=1;i<<=MAX;i++) {
# is it marked in the array?
if (array[i]) {
# yes
printf("%d ",i); # erase it for next time array[i]=0; } } printf("\n"); } }

If you do win a lottery, send me a postcard.


String Functions in AWK

Soon after I finished a recent column, I realized I made a slight mistake. No, wait! I was merely testing my readers to see if they are paying attention. Yeah. I gave a formula for calculating random numbers within a particular range. Well, it didn't always work. I realized this right after they announced the winners for the $70 million lottery, and I didn't win. Truthfully, the program would occasionally output the improper amount of numbers. The error occurred randomly, of course. The distribution of numbers wasn't as even as I wanted, either. The correct code fragment is below:

#!/usr/bin/nawk -f
BEGIN {
# Assume we want 6 random numbers between 1 and 36
# We could get this information by reading standard input,
# but this example will use a fixed set of parameters. #
# First, initialize the seed srand();
# How many numbers are needed?
NUM=6;
# what is the minimum number
MIN=1;
# and the maximum?
MAX=36;
# How many numbers will we find? start with 0 
Number=0;
while (Number << NUM) {
r=int(((rand() *(1+MAX-MIN))+MIN));
# have I seen this number before?
if (array[r] == 0) {
# no, I have not
Number++;
array[r]++;
}
}

# now output all numbers, in order
for (i=MIN;i<<=MAX;i++) {
# is it marked in the array?
if (array[i]) {
# yes
printf("%d ",i);
}
}
printf("\n");
exit;
}

String functions

I recently discussed the numeric functions available in the various incantations of AWK. This month I will finish the discussion of functions. Remember, everything in AWK is also in NAWK and the Free Software Foundation's version called GAWK. The latest version of GAWK seems to be a superset of NAWK, but I haven't tested everything yet.

Besides numeric functions, there are two other types of function: strings and the whatchamacallits. First, a list of the string functions:

¦|	String Functions	¦|
¦| Name	Variant	¦|
¦|	¦|
¦| index(string,search)	AWK, NAWK, GAWK¦|
¦| length(string)	AWK, NAWK, GAWK¦|
¦| split(string,array,separator) AWK, NAWK, GAWK| ¦| substr(string,position)	AWK, NAWK, GAWK¦|
¦| substr(string,position,max)	AWK, NAWK, GAWK¦|
¦| sub(regex,replacement)	NAWK, GAWK	¦|
¦| sub(regex,replacement,string) NAWK, GAWK	¦|
¦| gsub(regex,replacement)	NAWK, GAWK	¦|
¦| gsub(regex,replacement,string) NAWK, GAWK	¦|
¦| match(string,regex)	NAWK, GAWK	¦|
¦| tolower(string)	GAWK	¦|
¦| toupper(string)	GAWK	¦|

Most people first use AWK to perform simple calculations. Associative arrays and trigonometric functions are somewhat esoteric features, that new users embrace with the eagerness of a chain smoker in a fireworks factory. I suspect most users add some simple string functions to their repertoire once they want to add a little more sophistication to their AWK scripts. I hope this column gives you enough information to inspire your next effort. There are four string functions in the original AWK: index(), length(), split(), and substr(). These functions are quite versatile.

The length function

What can I say? The length() function calculates the length of a string. I often use it to make sure my input is correct. If you wanted to ignore empty lines, check the length of the each line before processing it with

if (length($0) > 1) {
. . .
}

You can easily use it to print all lines longer than a certain length, etc. The following command centers all lines shorter than 80 characters:

#!/bin/awk -f
{
if (length($0) << 80) {
prefix = "";
for (i = 1;i<<(80-length($0))/2;i++)
prefix = prefix " ";
print prefix $0;
} else {
print;
}
}

The index function

If you want to search for a special character, the index() function will search for specific characters inside a string. To find a comma, the code might look like this:

sentence="This is a short, meaningless sentence."; if (index(sentence, ",") >> 0) {
printf("Found a comma in position %d\n", index(sentence,",")); }

The function returns a positive value when the substring is found. The number specified the location of the substring.

If the substring consists of 2 or more characters, all of these characters must be found, in the same order, for a non-zero return value. Like the length() function, this is useful for checking for proper input conditions.

The substr function

The substr() function can extract a portion of a string. One common use is to split a string into two parts based on a special character. If you wanted to process some mail addresses, the following code fragment might do the job:

#!/bin/awk -f
{
# field 1 is the e-mail address - perhaps 
if ((x=index($1,"@")) >> 0) {
username = substr($1,1,x-1);
hostname = substr($1,x+1,length($1));
# the above is the same as
#	hostname = substr($1,x+1);
printf("username = %s, hostname = %s\n", username, hostname); }
}

The substr() function takes two or three arguments. The first is the string, the second is the position. The optional third argument is the length of the string to extract. If the third argument is missing, the rest of the string is used.

The substr function can be used in many non-obvious ways. As an example, it can be used to convert upper case letters to lower case.


#!/usr/bin/awk -f
# convert upper case letters to lower case BEGIN {
LC="abcdefghijklmnopqrstuvwxyz";
UC="ABCDEFGHIJKLMNOPQRSTUVWXYZ";
}
{
out="";
# look at each character
for(i=1;i<<=length($0);i++) {
# get the character to be checked
char=substr($0,i,1);
# is it an upper case letter?
j=index(UC,char);
if (j >> 0 ) {
# found it
out = out substr(LC,j,1);
} else {
out = out char;
}
}
printf("%s\n", out);
}

GAWK's tolower and toupper function

GAWK has the toupper() and tolower() functions, for convenient conversions of case. These functions take strings, so you can reduce the above script to a single line:

#!/usr/local/bin/gawk -f
{
print tolower($0);
}

The Split function

Another way to split up a string is to use the split() function. It takes three arguments: the string, an array, and the separator. The function returns the number of pieces found. Here is an example:

#!/usr/bin/awk -f
BEGIN {
# this script breaks up the sentence into words, using # a space as the character separating the words 
string="This is a string, is it not?";
search=" ";
n=split(string,array,search);
for (i=1;i<=n;i++) {
printf("Word[%d]=%s\n",i,array[i]);
}
exit;
}

The third argument is typically a single character. If a longer string is used, only the first letter is used as a separator.

NAWK's string functions

NAWK (and GAWK) have additional string functions, which add a primitive SED-like functionality: sub(), match(), and gsub().

Sub() performs a string substitution, like sed. To replace "old" with "new" in a string, use

sub(/old/, "new", string)

If the third argument is missing, $0 is assumed to be string searched. The function returns 1 if a substitution occurs, and 0 if not. If no slashes are given in the first argu- ment, the first argument is assumed to be a variable containing a regular expression. The sub() only changes the first occurrence. The gsub() function is similar to the g option in sed: all occurrence are converted, and not just the first. That is, if the patter occurs more than once per line (or string), the substitution will be performed once for each found pattern. The following script:

#!/usr/bin/nawk -f
BEGIN {
string = "Another sample of an example sentence"; pattern="[Aa]n";
if (gsub(pattern,"AN",string)) {
printf("Substitution occurred: %s\n", string); }

exit;
}

print the following when executed:

Substitution occurred: ANother sample of AN example sentence

As you can see, the pattern can be a regular expression.

The match function

As the above demonstrates, the sub() and gsub() returns a positive value if a match is found. However, it has a side effect of changing the string tested. If you don't wish this, you can copy the string to another variable, and test the spare variable. NAWK also provides the match() function. If match() finds the regular expression, it sets two special variables that indicate where the regular expression begins and ends. Here is an example that does this:

#!/usr/bin/nawk -f
# demonstrate the match function

BEGIN {
regex="[a-zA-Z0-9]+";
}
{
if (match($0,regex)) {
#	RSTART is where the pattern starts
#	RLENGTH is the length of the pattern
before = substr($0,1,RSTART-1);
pattern = substr($0,RSTART,RLENGTH);
after = substr($0,RSTART+RLENGTH);
printf("%s<%s>%s\n", before, pattern, after); }
}

Lastly, there are the whatchamacallit functions. I could use the word "miscellaneous," but it's too hard to spell. Darn it, I had to look it up anyway.

¦|	Miscellaneous Functions	¦|
¦| Name	Variant	¦|
¦|	¦|
¦| getline	AWK, NAWK, GAWK¦|
¦| getline <

The system function

NAWK has a function system() that can execute any program. It returns the exit status of the program.

if (system("/bin/rm junk") != 0)
print "command didn't work";

The command can be a string, so you can dynamically create commands based on input. Note that the output isn't sent to the NAWK program. You could send it to a file, and open that file for reading. There is another solution, however.

The getline function

AWK has a command that allows you to force a new line. It doesn't take any arguments. It returns a 1, if successful, a 0 if end-of-file is reached, and a -1 if an error occurs. As a side effect, the line containing the input changes. This next script filters the input, and if a backslash occurs at the end of the line, it reads the next line in, eliminating the backslash as well as the need for it.

#!/usr/bin/awk -f
# look for a \ as the last character.
# if found, read the next line and append {
line = $0;
while (substr(line,length(line),1) == "\\") { # chop off the last character
line = substr(line,1,length(line)-1);
i=getline;
if (i >> 0) {
line = line $0;
} else {
printf("missing continuation on line %d\n", NR); }
}
print line;
}

Instead of reading into the standard variables, you can specify the variable to set:

getline a line
print a line;

NAWK and GAWK allow the getline function to be given an optional filename or string containing a filename. An example of a primitive file preprocessor, that looks for lines of the format

#include filename

and substitutes that line for the contents of the file:

#!/usr/bin/nawk -f
{
# a primitive include preprocessor
if (($1 == "#include") && (NF == 2)) {
# found the name of the file
filename = $2;
while (i = getline << filename ) {
print;
}
} else {
print;
}
}

NAWK's getline can also read from a pipe. If you have a program that generates single line, you can use

"command" | getline;
print $0;

or

"command" | getline abc;
print abc;

If you have more than one line, you can loop through the results:

while ("command" | getline) {
cmd[i++] = $0;
}

for (i in cmd) {
printf("%s=%s\n", i, cmd[i]);
}

Only one pipe can be open at a time. If you want to open another pipe, you must execute

close("command");

This is necessary even if the end of file is reached.

The systime function

The systime() function returns the current time of day as the number of seconds since Midnight, January 1, 1970. It is useful for measuring how long portions of your GAWK code takes to execute.

#!/usr/local/bin/gawk -f
# how long does it take to do a few loops? BEGIN {
LOOPS=100;
# do the test twice
start=systime();
for (i=0;i<

The strftime function

GAWK has a special function for creating strings based on the current time. It's based on the strftime(3c) function. If you are familiar with the "+" formats of the date(1) command, you have a good head-start on understanding what the strftime command is used for. The systime() function returns the current date in seconds. Not very useful if you want to create a string based on the time. While you could convert the seconds into days, months, years, etc., it would be easier to execute "date" and pipe the results into a string. (See the previous script for an example). GAWK has another solution that eliminates the need for an external program.

The function takes one or two arguments. The first argument is a string that specified the format. This string contains regular characters and special characters. Special characters start with a backslash or the percent character. The backslash characters with the backslash prefix are the same I covered earlier. In addition, the strftime() func- tion defines dozens of combinations, all of which start with "%". The following table lists these special sequences:

¦|	GAWK's strftime formats	¦|	¦|
¦|
¦|%a The locale's abbreviated weekday name	¦|
¦|%A The locale's full weekday name	¦|
¦|%b The locale's abbreviated month name	¦|
¦|%B The locale's full month name	¦|
¦|%c The locale's "appropriate" date and time representation ¦| ¦|%d The day of the month as a decimal number (01--31)	¦|
¦|%H The hour (24-hour clock) as a decimal number (00--23)¦ | |%I The hour (12-hour clock) as a decimal number (01--12) ¦| ¦|%j The day of the year as a decimal number (001--366)	¦|
¦|%m The month as a decimal number (01--12)	¦|
¦|%M The minute as a decimal number (00--59)	¦|
¦|%p The locale's equivalent of the AM/PM	¦|
¦|%S The second as a decimal number (00--61).	¦|
¦|%U The week number of the year (Sunday is first day of week¦)| ¦|%w The weekday as a decimal number (0--6). Sunday is day 0 ¦| ¦|%W The week number of the year (Monday is first day of week)| |%x The locale's "appropriate" date representation	¦|
¦|%X The locale's "appropriate" time representation	¦|
¦|%y The year without century as a decimal number (00--99) ¦| ¦|%Y The year with century as a decimal number	¦|
¦|%Z The time zone name or abbreviation	¦|
¦|%% A literal %.	¦|

Depending on your operating system, and installation, you may also have the following formats:


¦|	Optional GAWK strftime formats	¦|
¦|	¦|
¦|%D Equivalent to specifying %m/%d/%y	¦|
¦|%e The day of the month, padded with a blank if it is only one digit¦| ¦|%h Equivalent to %b, above	¦|
¦|%n A newline character (ASCII LF)	¦|
¦|%r Equivalent to specifying %I:%M:%S %p	¦|
¦|%R Equivalent to specifying %H:%M	¦|
¦|%T Equivalent to specifying %H:%M:%S	¦|
¦|%t A TAB character	¦|
¦|%k The hour as a decimal number (0-23)	¦|
¦|%l The hour (12-hour clock) as a decimal number (1-12)	¦|
¦|%C The century, as a number between 00 and 99	¦|
¦|%u is replaced by the weekday as a decimal number [Monday == 1]	¦|
¦|%V is replaced by the week number of the year (using ISO 8601)	¦|
¦|%v The date in VMS format (e.g. 20-JUN-1991)	¦|

One useful format is

strftime("%y %m %d %H %M %S")

This constructs a string that contains the year, month, day, hour, minute and second in a format that allows convenient sorting. If you ran this at noon on Christmas, 1994, it would generate the string

94 12 25 12 00 00

Here is the GAWK equivalent of the date command:

#! /usr/local/bin/gawk -f
#

BEGIN {
format = "%a %b %e %H:%M:%S %Z %Y";
print strftime(format);
}

You will note that there is no exit command in the begin statement. If I were using AWK, an exit statement is necessary. Otherwise, it would never terminate. If there is no action defined for each line read, NAWK and GAWK do not need an exit statement.

If you provide a second argument to the strftime() function, it uses that argument as the timestamp, instead of the current system's time. This is useful for calculating future times. The following script calculates the time one week after the current time:

#!/usr/local/bin/gawk -f
BEGIN {
# get current time
ts = systime();
# the time is in seconds, so
one day = 24 * 60 * 60;
next week = ts + (7 * one day);
format = "%a %b %e %H:%M:%S %Z %Y";
print strftime(format, next week);
exit;
}

User defined functions

Finally, NAWK and GAWK support user defined functions. This function demonstrates a way to print error messages, including the filename and line number, if appropriate:

#!/usr/bin/nawk -f
{
if (NF != 4) {
error("Expected 4 fields");
} else {
print;
}
}
function error ( message ) {
if (FILENAME != "-") {
printf("%s: ", FILENAME) >> "/dev/tty";
}
printf("line # %d, %s, line: %s\n", NR, message, $0) >>>> "/dev/tty"; }

One more note: one of my readers mentioned that I forgot to explain the use of ">>>>" with the AWK print statement. Like the shell, the double angle brackets indicates output is appended to the file, instead of written to an empty file. Appending to the file does not delete the old contents. However, there is a subtle difference between AWK and the shell.

Consider the shell program:

#!/bin/sh

while x=`line`

echo got $x >>>>/tmp/a

echo got $x >>/tmp/b

done

This will read standard input, and copy the standard input to files "/tmp/a" and "/tmp/b". File "/tmp/a" will grow larger, as information is always appended to the file. File "/tmp/b", however, will only contain one line. This happens because each time the shell see the ">>" or ">>>>" characters, it opens the file for writing, choosing the truncate/create or appending option at that time.

Now consider the equivalent AWK program:

	#!/usr/bin/awk -f
	{
	print $0 >>>>"/tmp/a"
	print $0 >>"/tmp/b"
	}
This behaves differently. AWK chooses the create/append option the first time a file is opened for writing. Afterwards, the use of ">>" or ">>>>" is ignored. Unlike the shell, AWK copies all of standard input to file "/tmp/b".

Introduction to Mailtool

Reading electronic mail is a very personal action. I've noticed people become attached to their mail reader with a passion that approaches their favorite editor, political party and salad dressing. With this in mind, trying to convince someone to switch to Sun's mailtool utility should be comparable to convincing a cat that a ride in a washing machine is fun. Nevertheless, I will try. Mailtool started out as a wrapper around the Berkeley mail program /usr/ucb/mail (which is different than /bin/mail). Originally, it was a SunView-based program, and evolved into the current version. The binary file contains a copyright of 1987, it has changed a bit in the last 8 years.

Many OpenWindow applications suffer from an all-or-nothing syndrome. Using a single OpenWindow application will be painful. On the other hand, if you use only OpenWindow applications, other applications seem awkward and cause frustrations. Mailtool is a primary focus of this dichotomy, consequently it gets a lot of flack. There is a lot of synergy between mailtool and other OpenWindow programs, which is good or bad, depending on your view. If you use textedit, mailtool will be easy to learn. If not, it will be painful. Mailtool always uses the textedit editor. You cannot change this. I have already discussed textedit in depth, so I will assume you understand the basics. Textedit isn't the most powerful editor in the world. It is easy to use, and easy to extend, by adding filters and one or two key bindings. I don't use textedit when I write programs. It isn't a perfect editor. However, for writing mail messages it is more than sufficient. I run a moderated mailing list on the Internet and manage the address changes, formatting, and editing using mailtool and textedit. The power is there. Most people don't know how to use it.

Mailtool also interacts with other programs, using the drag-and-drop paradigm. You can drag messages to and from the file manager, print tool, tape tool, textedit and even the calendar manager. I'll go into this later. First on the agenda, an introduction to mailtool.

Mailtool basics

One of the nicest features of mailtool is its ease of use. Many people use it without reading the manual. The OPEN LOOK interface helps here. The MENU button on the mouse will bring up a pop-up menu. The SELECT mouse key will perform an action, or select the object you want the pop-up menu to manipulate. You can move the mouse anywhere inside the window, and press the HELP function key. This plus an intuitive interface makes mailtool painless to a lot of beginners.

Another nice feature of mailtool is the flexibility of having several ways to perform the same operation. A pop-up menu button has the DELETE function, as well as a single-purpose button. In fact, there are at least 10 different ways to delete mail messages using mailtool. Most people only know three or four ways, because the first and most intuitive method they tried worked and they never needed to find the other techniques. They are there, but I'll cover them later.

Let's look at the main command panel of mailtool. On top, there are eight button menus, three buttons, a text-entry field with the label, and a small triangle inside a small box called an abbreviated menu button. All items on this panel that have the triangle have a pop-up menu associated with them; the triangle indicates this. Underneath this panel is the list of mail messages in your In-Box. You can simply double-click on one of these messages to read it, or you can use the buttons on the main command panel.

Review of menu buttons operations

As I mentioned earlier, the oval buttons with the triangles are Menu Buttons. If you position the mouse over these buttons, and press MENU (typically the right mouse button), a pop-up menu will appear. If you hold the MENU button, you can drag the mouse to your choice and release the button. If you click the mouse button once, the menu will stay in place, allowing you to click a second time to either make your choice, or dismiss the menu. Some like this Click-Move-Click better than the Press-Drag-Release, and claim is stresses the hand less. You can choose either method at any time.

There are several ways to accelerate these actions if you expect to perform them often. For example, you can pin the menu in place, by moving or clicking on the pushpin icon in the menu. Newer versions of mailtool have keyboard equivalents to certain commands. The equivalents are displayed in the menu. Meta-X is the accelerator for the delete function. (The Meta key is marked with a small diamond on a Sun keyboard.)

Each menu button has a default choice with an oval line around this choice. If you position the mouse over the menu button and press the SELECT mouse key, the default choice will be performed. The name of this action is previewed inside the button, if you forgot what the default is. You can change the default by holding down the Control function key modifier while dragging the mouse to that choice. When you do this, the black oval slides down to the new default selection. You can then select this action using the SELECT mouse button next time. Remember that I said there are several different ways to perform the same action? You just learned four methods for each of these buttons, five if it has a keyboard equivalent.

It's time to examine the particulars. There are subtle divisions in the command panel. The left side of the panel deals with basic operations; the right manages file folders. The upper left quadrant, which contains four menu buttons, are predefined menu buttons. The lower left quadrant contains user defined buttons, or accelerators. We'll go in order.

The permanent menu buttons

There are four permanently defined menu buttons in the upper left quadrant: File, View, Edit and Compose.

The File Menu button

Operations associated with starting and quitting mailtool are in this menu button. This menu contains the following choices, listed below with the keyboard equivalents (if available):

	Load In-Box (Meta-o)
	Print (Meta-p)
	Save Changes (Meta-s)
	Done
	Mail Files...

The default command loads your In-Box, or folder containing incoming mail. Print sends a copy of the selected message or messages to the printer. Save Changes and Done are very important commands and you should execute these commands frequently. Mailtool works with a copy of your incoming mail messages, and pressing either button makes the changes permanent. This makes sure your changes get saved to the disk.

Earlier versions had problems if the system or program crashed, or you started reading your mail with another copy of Mailtool. Another potential problem might occur if you decide to read mail from a remote site, using mail or some other reader. I have developed the habit of saving my changes at the end of the day, so I can read my mail at night from home. The Done command is a very handy way to do this, because it saves changes and closes the window at the same time. There is a disadvantage to saving changes: you can no longer undo any file deletions.

The "Mail Files..." command opens a new window for selecting file folders. More on this later.

The View Menu button

There are five choices in this menu.

	Messages >
	Previous
	Next
	Sort By >
	Find...

"Next" and "Previous" should be obvious, and allow you to go up or down to read your next message. Of course you can double click on the message in the main window if you prefer. This menu allows you to change the order of the messages, and how you see them using the "Sort By" menu. This allows you to group all of the messages with the same sender, same subject, etc. together. Or you can rearrange them by time, or by size (to delete old or big messages). You can search for messages (more on this later), and select either an abbreviated view of the message, or the complete message, showing all of the header.

The Edit Menu button

Again, there are five choices:

	Cut (Meta-X)
	Copy (Meta-C)
	Delete
	Undelete >
	Properties...

There are two main uses for this menu: undeleting messages, and editing properties. The second allows you to customize mailtool to meet your needs. The first is an "oops" action. This menu button also allows you to delete messages, but I rarely use this as there is a "Delete" button below.

The Compose Menu button

	The four functions are:
	New (Meta-n)
	Reply >
	Forward
	Vacation >

I use this menu to forward messages. If I want to create a new message, I SELECT this button. This menu also allows you to reply to messages, or you can use the menu button below. When you reply, you can include the original message or not. You can also reply to everyone who received the original message, or just reply to the sender. The other use of the menu is to turn on/turn off a special "I'm on Vacation" message.

The User Defined button

Underneath the four permanent menu buttons are four user-defined buttons. You can modify them by editing the property of mailtool, but I normally leave them the same. I can always pin the other menus into place, or change the default of these menus, if I need to temporarily change the buttons. The four default buttons are "Done", "Next", "Delete", and the "Reply" menu button.

On the right side of the main command panel are operations that select various folders, used to store your mail. I'll talk about that next month.

So that's an overview of mailtool. I will be going into a lot more detail later. If you have any questions, send them to me and I will try to find a solution to your problem in the months to come. I will discuss adding signatures, formatting, and sophisticated searching mechanisms, as well as methods to manage your mail folders. Stay tuned!

Mail folders: finding old messages

As I suggested last month, I often save messages with the intention to read them later, assuming I can find it again. That is this month's problem -- finding old messages.

The simplest technique is to look at all of the mail messages in the mail header window. Use the scrollbar to search for the desired message. Use the arrow keys to move up or down a message, or just double-click on a message to view it. You can make the header window larger. Either resize it, or double-click on the border. The latter makes the window grow to full height. Another double-click returns the window to the previous size.

You can make the size of this window permanently larger. Select the "Edit=>>Properties" button, and then select the category "Header Window." Look for the "Display:" section, and adjust the value for "Headers" and "Characters Wide". Then "Apply" the changes to make it permanent.

Finding a message when you know something about the mail header is easy. The first step is to open the "Find Messages" window. There are several ways to do this. Move the mouse inside the mailtool window. Then you can do one of the following:

  1. Choose the "View" pop-up menu, and select "Find ..."
  2. Press the "Find" function key
  3. Press Meta-F.

Normally, the "Find" function searches for a word inside a file. In the "Mailtool Headers" window, the function behaves differently, but intuitively. The "Find Messages" pop-up window appears, with five text fields and four buttons. The file text entry fields allow you specify the sender, the receiver, the subject line or the person getting a carbon copy. Fill in what you know and click "Find Forward". The next message meeting these conditions will be selected and displayed. If you select a "From:" and a "Subject:" condition, both must be true for mailtool to select the message.

The search ignores the case of the letters, so upper/lower case does not matter. One exception is the "To/Cc:" field. Specifying a pattern here will match either mail header. Unfortunately, earlier versions of mailtool have a bug. If you have this version, the "To/Cc:" field acts like the "To:" field. If you want to check both conditions, you have to perform two searches. I do this by putting the string in the "To/CC:" field, and press select. After I have moved/deleted/printed the messages, I double or triple click on the pattern, and drag and drop it to the "CC:" field.

It is worthwhile to review some of the features of the "Find Messages" window. Characters can be edited using the mouse to select and the function keys to cut and paste. The six delete operations are also available (Delete, Shift Delete, Control-W, Shift-Control-W, Control-U, Shift- Control-U). The tab character will move to the next text-entry area. Shift-Tab will move to the previous text fields.

The "Find Forward" button will find the next mail message. Because it has a double-oval around the button, this is the default option. Pressing the return key does the same as clicking on the button. The "Find Backward" button will search in the opposite direction. The "Select All" will select all messages that match this pattern. A message will appear on the bottom of the window, stating how many messages were selected. Once this is done, you can print, delete copy or move these messages. The "Clear" button erases all text entries.

If you find yourself frequently using the "Find Messages" window, you can pin it into place. It will remain there, but go away if you close mailtool. If you desire, you can change one of the four customizable buttons to provide the "Find" function. Select the "Edit=>Properties" button, and select the "Mail Header" category. Then select one of the four adjacent buttons, with the current labels displayed. Go to the "Command" pop-up menu and select the "Find" command. Click "Apply", and the button will be changed to the "Find" function.

You can combine other features of mailtool when searching for a message. If you had all of your mail in a single folder, you could select all messages with a certain condition, and copy them to a new folder. Then you could select a different condition, and copy these also to the same new folder. Then you change to the new folder, sort by date, and then search through the messages, one at a time. You can also use the "Find Messages" window to select a particular subject or address. In this manner, you can effectively build up complex combinations.

That is, if you really want to. I think it's too time consuming. I prefer to use grep to search for a pattern. Once I find it, I open the folder and read the message. There is a problem with this. The grep command will print out the line that matches, but does not tell you which message contains this line. When you have a thousand messages and don't know which one contains the information you want, finding the right message can be difficult. I wrote a sed script that acts like grep, but prints out the "From"and "Subject:" fields as well as the line that matches. I call it "Subject grep".

#!/bin/sh
# Subject grep - written by Bruce Barnett < # Function - Search for pattern inside mail message
#	prints out From, Subject: and matching line
# usage
#	Subject grep pattern [files...]
#
case $# in
0) echo "Usage: $0 pattern";exit 1;;
*);;
esac;

#there is a potential bug - if the pattern contains a "/" # then sed will break
#PAT=$1;shift
PAT=`echo $1|sed 's;/;\\\/;g'`
shift
# use sed -n to disable printing, unless we explicitly request it
# NOTE - I have sed comments starting with "#" - not all versions of sed # allow this

sed -n '
/^From/,/^$/ {
/^From / {
# remember the from line
# put current From in hold buffer
x
# now delete the last one, which is in the working buffer d
}
/^Subject: / {
# append to hold space
H
# now delete the last one, which is now in the pattern space d
}
}
# found the string
# switch hold and pattern space
x
# print current pattern space
p
# get the original line back
x
# print it
p
# all a marker
a\
---
}' $*

If I wanted to search several mail folders for the topic of "kumquats", I would type:

cd $HOME/Mail
Subject grep kumquats Folder1 Folder2 Folder3

However, this will not work on a compressed file, and specifying a complex subdirectory structure is awkward. You can give it a directory as an argument. Assuming you name files using the convention I described last month, here is a version that will search through folders, uncompressing ones that need to be examined, then running "Subject grep" on the uncompressed file. It allows you to specify the subject of the mail folder, the year and/or the month. If you forget the options, a "-a" option will prompt you for missing options.


#!/bin/sh
# search for folder - written by Bruce Barnett < 

# defaults = where do I keep my mail?

MAILF=$HOME/Mail

# this time, I will use some Bourne shell functions for usage() errors usage() {
# report proper usage
echo "usage: $0 [-a] [-s subject] [-y year] [-m month] pattern" exit 1;
}

# the -a option will manually ask for missing arguments
# use this if you forget them
ask() {
# ask for all arguments

if [ -z "$SUB" ]; then
echo " (-s) What is the subject of the folder?" read SUB;
fi
if [ -z "$YR" ]; then
echo " (-y) What is the year?"
read YR;
fi


if [ -z "$MON" ]; then
echo " (-m) What is the month?"
read MON;
fi

}

# initialize
ASK=0;
PAT="";

# examine arguments
while [ $# -gt 0 ]
do
case "$1" in
-a) ASK=1;;
-y) shift;YR=$1;;
-m) shift;MON=$1;;
-s) shift;SUB=$1;;
-*) usage;;
*) PAT="${PAT} $1";
esac
shift
done

# What are we searching for?

if [ -z "$PAT" ]; then
# error
usage
fi

# do we need to ask for arguments?

if [ "$ASK" -eq 1 ]; then
ask
fi

# construct the filename pattern
# if not defined, assume a "*" is used
SEARCH="*${SUB:-*}*${YR:-*}*${MON:-*}"

for file in `find $MAILF -type f -name "$SEARCH" -print` 

do
#	must remove .Z or .gz at end of filename, if there
basename=`echo $file | sed -e 's/\.Z$//' -e 's/\.gz$//'`
if [ -f ${basename}.Z ]; then
uncompress $file
file=$basename
elif [ -f ${basename}.gz ]; then
gunzip $file
file=$basename
fi
grep -s "$PAT" $file >>/dev/null && {
echo "========";echo "${file}::";echo "========" Subject grep "$PAT" $file
Subject grep "$PAT" $file
}
done

There is only one minor problem left. Last month I discussed how to compress old mail folders. This month I give you a program that uncompresses the folders. All my hard work wasted. Well, not really. Here is a program that you can run using cron every week. It looks for folders that haven't been examined in a week, and compresses them.


#!/bin/sh
# Compress old folders - Bruce Barnett < # usage:
#	(no arguments - this is usually run by cron)

# defaults = where do I keep my mail?
MAILF=$HOME/Mail/Old

# which compress program do I use?
COMPRESS=compress
#COMPRESS=gzip

# how old does a file have to be before it is compressed
# I like 7 or more days
AGE=+7

# And how big?
# don't compress very small files
# I suggest >> 2 blocks
SIZE=+2


for file in `find $MAILF -type f -mtime $AGE -size $SIZE -print` 

do
basename=`echo $file ¦| sed -e 's/\.Z$//' -e 's/\.gz$//'` if [ ! -f ${basename}.Z -a ! -f ${basename}.gz ]; then #echo	$COMPRESS $file
$COMPRESS $file
fi
done

Mail Folders: Exploring more properties

The compose window options modifies the format of new messages. The first option specifies the prefix used to indicate included text. Normally this is the string ">> ", but you can modify it to be almost any string. Tabs are not allowed (although this was the default in an early version of mailtool).

When you reply to a message and include the original message, these characters are placed in front of each line. Also, if you include any other messages, using the Include=>Indented selection, it will also add these characters before each line. Personally, I like to remove unnecessary lines. This is easy to do with a simple filter, which I call "WhoSez":

#!/bin/sh 
# WhoSez - Bruce Barnett < 
# read a quoted mail message, change the attribution 
# and delete unnecessary lines: 
# usage - as a filter to mailtool 
sed ' 
/>> From/,/^>> $/{
/From /d
/Date:/d
/To:/d
/Subject:/d
/Cc:/d
/Content-Length:/d 
From: \(.*\)/\1 says.../ 
}
 # I also like to see blank lines stay blank. 
s/^>> *$// 
' 

Create this file, make it executable, and place it in your searchpath. Then add a line to your "text extras menu" file:
#Place this in your .text extras menu
"MailFrom" WhoSez

Or place a line in your ".textswrc" file
: #Place this in your .textsrc file
KEY TOP(3) FILTER
WhoSez

Select the text, and either call up the menu selection, or press function key number 3 on top of your keyboard.

Saving all your mail

You can keep a record of all of your outgoing mail. Specify a filename in the "Logged Messages File" field, and select "Log All Messages."

Another option in this window is "Request Confirmations." If you cancel a message you are composing, you may lose information, or an attachment, without realizing it. You may also unknowingly modify an old message, which may corrupt mail folders. Therefore, I recommend you leave it selected.

The "Show attachment list" option allows you to select if you want to show the attachment window when you compose new messages. You can always hide or show this window, but this selects the default.

There is one more useful option in this window -- the ability to add additional headers. Some sample mail headers follow:

Precendence: bulk
Reply-To: barnett@crd.ge.com
Return-Receipt-To: barnett@crd.ge.com

The "Precendence" header is a feature of some mail delivery agents (i.e. sendmail). If a mail message has a precendence of "junk," then if mail is not delivered, sendmail will not report the bounced message. Instead, it is discarded. Also, if the vacation program gets a message with the precendence of "bulk" or "junk," it will not reply to your message. These values are useful for large mailing lists. Other values can be used, and a priority can be set for each one, but adding additional values here requires adding additional lines in the sendmail configuration file.

The "Return-Receipt-To" header, if included, will send the person listed a notification when the mail box receives the mail message. Note that this does not tell you if a human reads the message, merely that it was delivered.

The "Reply-To" header is useful if you want to modify the default return address. Some systems generate incorrect, or perhaps not the best, return address. For instance, some sites may have a global set of aliases, shared among many mail servers. It is better to have the global address used in your mail messages, in case someone replies to an old message, and the address lists a mail server that no longer exists. It's also less typing for your friends. If the "Reply-To" header exists, and the person replies to your message, the default address will be the one listed in the "Reply-To" header instead of the "From" field.

To add one of these headers, type in the name of the Header Field and default value. Then press the "Add" and then the "Apply" buttons. When you compose a mail message, the "Header" pop-up menu will show each additional header field you can add. If you want to add a header field to all messages, instead of some, a different technique is used. I'll discuss this in the next article.

Mail filing properties

I've already discussed the mail filing window. This allows you to change your default mail directory, add permanent folders to the Move/Copy/Load buttons, and to make the size of the menu larger.

Template properties

If there are certain pieces of information you want to commonly include, you can make them a template. Simply stated, you pick a name of a template and a file that corresponds to this template. Press the "Add" and the "Apply" button. Then in the Composition window, the Include=>>Templates pop-up menu will list these choices. By default, the system usually gives you one template:

$OPENWINHOME/share/xnews/client/templates/calendar.tpl

I have two additional templates, which act as signatures, when I want to include them. One is a one-line signature, the other includes name, address and phone number, favorite color, astrological sign and preferred beverage.

Most people don't realize how useful the calendar template is. Let me briefly describe how it can be used:

  1. A new mail message is created. The "Subject" and "To" lines are manually filled in by the sender.
  2. The calendar template is inserted into the message using the Include=>>Templates menu.
  3. Control-Tab is pressed. This moves the insertion point to the "Date" field and selects the portion of text to be deleted. Notice how the date format is shown as "mm/dd/yy". The sender enters the numbers in the order shown, using the same number of digits as suggested.
  4. Control-Tab is pressed again, and the other fields are filled in, one at a time.
  5. The sender presses the "Deliver" button.
  6. When the message arrives, the receiver drags it to the calendar manager and drops it. This automatically creates the appointment.

The little-used Control-Tab feature of textedit allows forms to be quickly filled in. To make a form suitable for a template, just create the form, include the text to be replaced within "¦|>....<¦|" and add this file to your templates.

I should mention that besides templates, there are three other ways to add signatures to your mail messages. The second method is defining a function key to perform this function in the ".textswrc" file:

#Place this in your .textsrc file
KEY TOP(3) FILTER
cat - $HOME/.signature

The third is to add a line to your ".text extras menu"	file:	
Sign					cat - $HOME/.signature 

The "-" is used so that any text selected will not be deleted. You can quickly add a signature by clicking the text window four times, and selecting one of these actions. Because of the hyphen, you do not have to be at the end of themessage to append it.

The fourth technique, in which signatures can be added automatically, will be discussed in the next article.

Aliases

The "Aliases" property window is identical to the "Header=>>>Aliases..." selection in the Compose window. An alias allows you to specify electronic mail nicknames. That is, instead of providing a long, forgettable e-mail address, you specify a short, easy to remember abbreviation. You can alias "djbill" to the longer name "William Williams@fm.radio.edu" and save your fingers some typing. This is a simple operation. You specify the alias, the address, press "Add" and press "Apply". If you want, you can edit the lines in your ".mailrc" file.

Some people have asked me how to handle more sophisticated problems. The first useful technique is creating an alias to a program. I created the alias "printer" to have the value "¦| mp -¦l ¦¦| lpr ". If I include the user "printer" on a mail message, I get a copy. If you want to do file redirection, or use variables, you should create a script and specify the script's name to mailtool. Let the script handle complex shell commands. Mailtool's invocation doesn't duplicate all of the features of a shell.

The second useful technique is creating mailing lists. You can alias a name to a list of people, separating each name with a space.

You can attach several names to a list. If an alias is used more than once, the names on the right do not replace the old names; they are added to the list. So you can have an alias called "list" occur several times. This is one way to create large mailing lists. Another way is to create several smaller lists (e.g. "list1", "list2", etc.) and define alias "list" to send mail to "list1" and "list2". A third way is to construct a mail message, and use "/usr/ucb/mail" or one of the equivalent names, like "mailx" or "Mail," to send the file. If the list of names was in a file called "Names," one name per line, the following would send a message to those on the list:

Mail -s "This is the subject" `cat Names` <

This type of mailing list is alright for casual lists. There are several weaknesses with this kind of mailing list. The first, and most important, is that this list is private. Only you can use this list. If anyone else wants to use a similar list, they must create their own. If you modify your version of the list, others do not get this modification.

A second problem is that the mail headers look very long. You can hide the large list using the blind carbon copy option, i.e.

Mail -s "This is the subject" -b "`cat Names`" $LOGNAME

but this is awkward, and earlier versions of mail do not support this option.

The best way to handle this is to create a mailing list using sendmail's "/etc/aliases" file. A typical entry is

XYZ: :include:/usr/local/list/XYZ-list
XYZ-request: user@address
owner-XYZ: user@address

Replace XYZ with the name of the mailing list, and replace user@address with the electronic mail address of the person who maintains the list. The file "/usr/local/list/XYZ-list" is created and owned by the list maintainer. It contains the membership of the list, one person per line, using the standard address format found in your typical mail message.

Expert properties

There are two or three additional options in the "Expert" category. If you want to include your own name when you reply to everyone, select the "meto" option. If you want to ignore host names in the address, select the "allnet" option. The third option is new, and is used if you want to use the network aware mail file locking system.